Add more property-caching optimizations to x509 Rust backend#14441
Open
Add more property-caching optimizations to x509 Rust backend#14441
Conversation
The existing load benchmarks create a fresh object each iteration, so
the cache is always cold and caching optimisations show no benefit there.
Add benchmarks that construct the object once and then repeatedly call
the getter, exercising the warm-cache path:
Certificate : subject, issuer, public_key(),
signature_hash_algorithm, signature_algorithm_oid
CRL : issuer, serial-number lookup (hit and miss)
OCSPRequest : issuer_name_hash, issuer_key_hash,
hash_algorithm, serial_number (all in one bench)
OCSPResponse: issuer_key_hash, serial_number,
signature_hash_algorithm (all in one bench)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…ect caching The test assumed cert.subject re-parses the Name on every call, so it checked each too-long-country warning in its own pytest.warns block. After subject caching, parse_name runs only once (on the first access) and emits both COUNTRY_NAME and JURISDICTION_COUNTRY_NAME warnings in a single call. Subsequent accesses return the cached Name object without re-parsing, so the second block saw no warnings. Merge both assertions into a single pytest.warns block, which correctly captures all warnings emitted during the first (and only) parse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap the attributes getter in PyOnceLock so the expensive loop over ASN.1 attributes (OID conversion, PyBytes allocation, Attributes construction) runs at most once per CertificateSigningRequest object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap issuer_name_hash, issuer_key_hash, hash_algorithm, and serial_number getters in PyOnceLock so the allocations (PyBytes construction, integer conversion, hash-object instantiation) happen at most once per OCSPRequest object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…gorithm getter results Wrap the five most-frequently-accessed computed properties in PyOnceLock so the underlying work (name parsing, public-key loading, OID conversion, hash-algorithm object construction) runs at most once per Certificate object regardless of how many times callers read the attribute. Also update all Certificate struct construction sites (pkcs7.rs, ocsp_resp.rs) to initialise the new cache fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
The old implementation used index-based .nth(i) over a freshly-cloned iterator per certificate, making the total work O(n²) in the number of embedded certs. Also, each call rebuilt the Python list from scratch. Replace with a single linear pass using asn1::write_single to obtain independent DER bytes for each certificate (avoiding the need for the unsafe map_arc_data_ocsp_response helper), then wrap the built PyList in a PyOnceLock so subsequent calls return the cached object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
OCSPSingleResponse lacked an extensions getter entirely. Add one backed by a PyOnceLock so the extension-parsing work runs at most once per response object. Handles SCT and CRL entry extensions via the shared parse_and_cache_extensions helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap the issuer, signature_algorithm_oid, and signature_hash_algorithm getters in PyOnceLock so name parsing and OID/hash-object construction each run at most once per CertificateRevocationList object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
get_revoked_certificate_by_serial_number previously iterated over every revoked certificate on each call (O(n)). Build a HashMap<Vec<u8>, OwnedRevokedCertificate> on first call using the existing iterator infrastructure, then answer subsequent lookups in O(1). Also removes the now-unused try_map_crl_to_revoked_cert unsafe helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…icates caching OCSPSingleResponse.extensions was added in commit 986298b but had no tests. Add four tests in TestOCSPResponse: * test_single_response_extensions_empty – a typical response with no per-SingleResponse extensions returns an empty Extensions object and the result is the same cached object on repeated access. * test_single_response_extensions_sct – resp-sct-extension.der carries an SCT list in the raw_single_extensions field; verify it is exposed via the new getter on the OCSPSingleResponse iterator item. * test_single_response_extensions_reason – resp-single-extension-reason.der carries a CRLReason; verify it surfaces correctly. * test_certificates_cached – OCSPResponse.certificates is cached behind a PyOnceLock; verify that two successive accesses return the identical Python list object (is-identity check). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Member
|
Thanks for submitting this -- for ease of review, can you split this into a few smaller PRs? My suggsetion would be to start with splitting out:
and we can go from there. Thanks |
Contributor
Author
Member
|
Yeah since GH hasn’t shipped dependent PRs yet you should just submit one and once it merges rebase and submit the next. |
This was referenced Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PyCA x509 Rust backend — caching optimizations
I was working on my general purpose ASN.1 library and when creating Python bindings, I tested against PyCA code. Some operations in PyCA were slow compared to my code so I wanted to look into what could be improved. With the help of Claude Code I've got some repeatable patterns improved using the same approach PyCA already had in place for some properties.
Below is a report Claude created.
Background
The x509 Rust backend (
src/rust/src/x509/) converts parsed ASN.1 data into Python objects on every property access. Operations like name parsing (parse_name), public-key loading, OID conversion, and serial-number iteration are not cheap: each one allocates Python objects, traverses ASN.1 sequences, and crosses the Rust/Python FFI boundary. In workloads that touch the same property more than once on the same object (chain building, path validation, CRL checking, OCSP processing) this cost is paid repeatedly and unnecessarily.The existing mitigation is
pyo3::sync::PyOnceLock<pyo3::Py<pyo3::PyAny>>: a thread-safe write-once cell that stores the Python object after the first computation. It was already used for extension lists everywhere. The work described here extends that pattern to the remaining uncached properties.Caching pattern
Every cached getter follows the same idiom:
get_or_try_initis a no-op on every call after the first; the atomic check costs ~50 ns, the cached result is returned without any allocation.What was implemented
Ten changes were made across five files, committed individually on the
performance-improvementsbranch.e7cc638csr.rsCertificateSigningRequest.attributes75451dcocsp_req.rsOCSPRequestissuer_name_hash,issuer_key_hash,hash_algorithm,serial_number1830ca3certificate.rs,pkcs7.rs,ocsp_resp.rsCertificateissuer,subject,public_key,signature_algorithm_oid,signature_hash_algorithme90f4e5ocsp_resp.rscertificatesiteration; cache the resulting list986298bocsp_resp.rsOCSPSingleResponse.extensionsgetter with cachingf18d144crl.rsCRLissuer,signature_algorithm_oid,signature_hash_algorithmd149aafcrl.rsget_revoked_certificate_by_serial_numberlinear scan with O(1)HashMapThe
OCSPResponse.certificatesgetter additionally had a documented O(n²) bug (each certificate extracted viaclone().nth(i)restarted the iterator). It was replaced with a single linear pass usingasn1::write_singleto produce independent DER bytes for each certificate, eliminating the need for themap_arc_data_ocsp_responseunsafe helper.Benchmark results
Benchmarks measure repeated access on a single pre-loaded object — the workload that caching is designed to accelerate. Each benchmark creates the object once outside the timed loop, then calls the getter in a tight loop.
Comparison:
main(baseline) vsabbra-p-f(PR), both built withmaturin develop --release, Python 3.14, OpenSSL 3.5.certificate_subjectcertificate_issuercrl_issuercertificate_public_keyocsp_request_propertiescrl_serial_number_lookup_misscertificate_signature_hash_algorithmcrl_serial_number_lookup_hitcertificate_signature_algorithm_oidocsp_response_propertiesThe subject/issuer/CRL-issuer gains are ~100× because
parse_nameis the most expensive operation — it constructs a full PythonNameobject tree from ASN.1 on every call. The cached path costs only an atomic load plus a Python reference clone (~50 ns regardless of name complexity).crl_serial_number_lookup_hitis 34% faster rather than near-zero becauseget_revoked_certificate_by_serial_numbermust still construct a newRevokedCertificatePython object on each hit (theHashMapstoresOwnedRevokedCertificatevalues that are cloned per call). The miss path (90% faster) avoids iterating the whole list and drops from O(n) to O(1).ocsp_response_propertiesshows no meaningful change because the properties benchmarked there (issuer_key_hash,serial_number,signature_hash_algorithmon the response-level object) were already relatively cheap and the test exercises only a few iterations of the warm path.Why the existing load benchmarks showed no improvement
test_load_der_certificateandtest_load_pem_certificateeach callx509.load_der_x509_certificate(bytes)per iteration, creating a fresh object with empty caches each time. The cache is always cold; caching adds zero benefit and a tiny overhead (extraPyOnceLock::new()fields). These benchmarks measure parsing throughput, not property-access throughput, so they are unaffected by this work.Benchmark reproduction
uv venv /tmp/bench-venv --python python3.14 uv pip install --python /tmp/bench-venv/bin/python \ maturin pytest pytest-benchmark certifi setuptools cffi uv pip install --python /tmp/bench-venv/bin/python -e vectors/ # baseline (main branch) git checkout main cp tests/bench/test_x509.py /tmp/bench_test.py # copy new benchmarks over VIRTUAL_ENV=/tmp/bench-venv maturin develop --release /tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \ -k "subject or issuer or public_key or signature or crl_serial or ocsp" \ --benchmark-json=/tmp/bench_base.json --benchmark-enable \ --benchmark-warmup=on --benchmark-min-rounds=200 -q # PR branch git checkout abbra-p-f VIRTUAL_ENV=/tmp/bench-venv maturin develop --release /tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \ -k "subject or issuer or public_key or signature or crl_serial or ocsp" \ --benchmark-json=/tmp/bench_pr.json --benchmark-enable \ --benchmark-warmup=on --benchmark-min-rounds=200 -q python3 .github/bin/compare_benchmarks.py /tmp/bench_base.json /tmp/bench_pr.json